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Publisher: ACM Press 
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Protection of copyrights and revenues of content owners in the digital world has been 
gaining importance in the recent years. This paper presents a way of fingerprinting text 
documents that can be used to identify content and expression similarities in documents, 
as a way of facilitating tracking of digital copies of works, to ensure proper compensation 
to content owners/The fingerprints we collected consist of surface, syntactic, and semantic 
features of documents. Because they reflect mostly h ... 
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volume tracking, part-of-speech tagged features, semantic features, surface parsing, 
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Full text available: *g ] pdf(800.99 KB) terms 

This paper surveys research areas relevant to cultural heritage digital libraries. The 
emerging National Science Digital Library promises to establish the foundation on which 
those of us beyond the scientific and engineering community will likely build. This paper 
thus articulates the particular issues that we have encountered in developing cultural 
heritage collections. We provide a broad overview of audiences, collections, and services. 
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Full text available: p3 pdf(1.48 MB) 

i^*^—* terms 

This paper presents and evaluates the storage management and caching in PAST, a large- 
scale peer-to-peer persistent storage utility. PAST is based on a self-organizing, Internet- 
based overlay network of storage nodes that cooperatively route file queries, store 
multiple replicas of files, and cache additional copies of popular files. In the PAST system, 
storage nodes and files are each assigned uniformly distributed identifiers, and replicas of 
a file are stored at nodes whose identifier matches ... 

4 'Memex' as an image of potentiality in information retrieval research and development 
Linda C. Smith 

June 1980 Proceedings of the 3rd annua! ACM conference on Research and 
development in information retrieval 

Publisher: Butterworth & Co. 

Full text available: ||| pdf(1.29 MB) Additional Information: full citation , references 



Automatic recording agent for digital video server 
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October 2000 Proceedings of the eighth ACM international conference on Multimedia 
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Full text available: l p 3pdf(932.31 KB) 

LLjr terms 

We propose and evaluate the performance of a number of methods for automatic 
recording of TV programs for digital video servers, which estimate the user's preference 
over TV programs based on her/his past viewing behavior and automatically record a 
selected number of TV programs believed to be of interest to the user. Our methods 
combine the so-called content-based filtering and social (or collaborative) filtering 
methods and are based on a certain class of on-line learning algorithms known a ... 

Building a hypertextual digital library in the humanities: a case study on London 
Gregory Crane, David A. Smith, Clifford E. Wulfman 

January 2001 Proceedings of the 1st ACM/IEEE-CS joint conference on Digital 
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Full text available: m pdf(361.82 KB) 
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This paper describes the creation of a new humanities digital library collection: 
11,000,000 words and 10,000 images representing books, images and maps on pre- 
twentieth century London and its environs. The London collection contained far more 
dense and precise information than the materials from the Greco-Roman world on which 
we had previously concentrated. The London collection thus allowed us to explore new 
problems of data structure, manipulation, and visualization. This paper contrast ... 
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Additional Information: full citation , abstract , references , citings , index 
terms 

Silver is an authoring tool that aims to allow novice users to edit di gital video. The goal is 
to make editing of digital video as easy as text editing. Silver provides multiple 
coordinated views, including project, source, outline, subject, storyboard, textual 
transcript and timeline views. Selections and edits in any view are synchronized with all 
other views. A variety of recognition algorithms are applied to the video and audio content 
and then are used to aid in the editing tasks. The ... 

Keywords: digital video editing, informedia, multimedia authoring, silver, video library 
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Information retrieval on the web 
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June 2000 ACM Computing Surveys (CSUR), volume 32 issue 2 
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In this paper we review studies of the growth of the Internet and technologies that are 
useful for information search and retrieval on the Web. We present data on the Internet 
from several different sources, e.g., current as well as projected number of users, hosts, 
and Web sites. Although numerical figures vary, overall trends cited by the sources are 
consistent and point to exponential growth in the past and in the coming decade. Hence it 
is not surprising that about 85% of Internet user ... 

Keywords: Internet, World Wide Web, clustering, indexing, information retrieval, 
knowledge management, search engine 



A survey of Web metrics 

Devanshu Dhyani, Wee Keong Ng, Sourav S. Bhowmick 

December 2002 ACM Computing Surveys (CSUR), Volume 34 issue 4 

Publisher: ACM Press 
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Full text available: TO pdf(289.28 KB) 

LLiJ "^ terms 

The unabated growth and increasing significance of the World Wide Web has resulted in a 
flurry of research activity to improve its capacity for serving information more effectively. 
But at the heart of these efforts lie implicit assumptions about "quality" and "usefulness" 
of Web resources and services. This observation points towards measurements and 
models that quantify various attributes of web sites. The science of measuring all aspects 
of information, especially its storage and retrieval or ... 

Keywords: Information theoretic, PageRank, Web graph, Web metrics, Web page 
similarity, quality metrics 
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11 Recent Studies in Automatic Text Analysis and Document Retrieval 
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Many experts in mechanized text processing now agree that useful automatic language 
analysis procedures are largely unavailable and that the existing linguistic methodologies 
generally produce disappointing results. An attempt is made in the present study to 
identify those automatic procedures which appear most effective as a replacement for the 
missing language analysis. A series of computer experiments is described, designed to 
simulate a conventional document retrieval environ ... 

12 PicASHOW: pictorial authority search by hyperlinks on the web 

yjk January 2002 ACM Transactions on Information Systems (TOIS), volume 20 issue 1 
Publisher: ACM Press 
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review 

We describe PicASHOW, a fully automated WWW image retrieval system that is based on 
several link-structure analyzing algorithms. Our basic premise is that a page p displays 
(or links to) an image when the author of p considers the image to be of value to the 
viewers of the page. We thus extend some well known link-based WWW page retrieval 
schemes to the context of image retrieval. PicASHOW's analysis of the link structure 
enables it to retrieve relevant images even when those ... 

Keywords: Image retrieval, hubs and authorities, image hubs, link structure analysis 
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Advances in the media and entertainment industries, for example streaming audio and 
digital TV, present new challenges for managing large audio-visual collections. Efficient 
and effective retrieval from large content collections forms an important component of the 
business models for content holders and this is driving a need for research in audio-visual 
search and retrieval. Current content management systems support retrieval using low- 
level features, such as motion, colour, texture, beat and ... 

Keywords: content-based retrieval, sports video analysis, temporal models 
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Research papers available on the World Wide Web (WWW or Web) are often poorly 
organized, often exist in forms opaque to search engines (e.g. Postscript), and increase in 
quantity daily. Significant amounts of time and effort are typically needed in order to find 
interesting and relevant publications on the Web. We have developed a Web based 
information agent that assists the user in the process of performing a scientific literature 
search. Given a set of keywords, the agent uses Web s ... 

15 Supporting mobility in publish/subscribe middleware: Looking into the past: 
^ enhancing mobile publish/subscribe middleware 

^ M. Cilia, L. Fiege, C. Haul, A. Zeidler, A. P. Buchmann 

June 2003 Proceedings of the 2nd international workshop on Distributed event- 
based systems 
Publisher: ACM Press 
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Publish/subscribe (pub/sub) middleware facilitates loosely coupled cooperation and fits 
well the needs of spontaneous, ad-hoc interaction. However, newly started mobile 
applications have to be bootstrapped to interpret the current flow of notifications correctly 
and commence normal operation. This problem is aggravated in mobile environments 
where disconnections and context changes occur frequently. In this paper, we propose two 
forms of subscriptions that allow consumers to subscribe to past eve ... 

16 Some research problems in automatic information retrieval 
jfo G. Salton 

^ June 1983 ACM SIGIR Forum , Proceedings of the 6th annual international ACM 

SIGIR conference on Research and development in information retrieval 

SIGIR '83, Volume 17 Issue 4 

Publisher: ACM Press 

Full text available: |^ pdf(766.54 KB) Additional Information: full citation , abstract , references 

Information retrieval components are currently incorporated in several types of 
information systems, including bibliographic retrieval systems, data base management 
systems and question-answering systems. Some of the problems arising in the real-time 
environment in which these systems operate are briefly discussed. Certain recent 
advances in information retrieval research are then mentioned, including the formulation 
of new probabilistic retrieval models, and the development of automatic documen ... 

17 Integrating a dynamic lexicon with a dynamic full-text retrieval system 
Peter G. Anick, Rex A. Flynn 

^ July 1993 Proceedings of the 16th annual international ACM SIGIR conference on 
Research and development in information retrieval 
Publisher: ACM Press 
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terms 

There has been a great deal of interest within the Information Retrieval community in 
evaluating the use of linguistic knowledge to improve the indexing and searching of 
textual databases. Such systems must often employ a lexicon to store information about 
the words and phrases comprising the application's domain. Unlike a static lexicon, a 
dynamic lexicon raises practical concerns about the coordination between the state of the 
lexicon and IR indexing sche ... 

18 Case-based reasoning: Automatic summarisation of legal documents 
J*b Claire Grover, Ben Hachey, Ian Hughson, Chris Korycinski 

V* June 2003 Proceedings of the 9th international conference on Artificial intelligence 
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We report on the SUM project which applies automatic summarisation techniques to the 
legal domain. We describe our methodology whereby sentences from the text are 
classified according to their rhetorical role in order that particular types of sentence can 
be extracted to form a summary. We describe some experiments with judgments of the 
House of Lords: we have performed automatic linguistic annotation of a small sample set 
and then hand-annotated the sentences in the set in order to explore the ... 

19 Building efficient and effective metasearch engines H 
^ Weiyi Meng, Clement Yu, King-Lup Liu 

March 2002 ACM Computing Surveys (CSUR), Volume 34 issue l 

Publisher: ACM Press 
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terms 

Frequently a user's information needs are stored in the databases of multiple search 
engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search 
engines and identify useful documents from the returned results. To support unified 
access to multiple search engines, a metasearch engine can be constructed. When a 
metasearch engine receives a query from a user, it invokes the underlying search engines 
to retrieve useful information for the user. Metasearch engines have ... 

Keywords: Collection fusion, distributed collection, distributed information retrieval, 
information resource discovery, metasearch 



20 Video retrieval: Feature extraction and content analysis for sports videos annotation jjj 
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v September 2001 Proceedings of the 2001 ACM workshops on Multimedia: multimedia 
information retrieval 

Publisher: ACM Press 
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This paper illustrates an approach to semantic video annotation in the specific context of 
sports videos. Videos are automatically annotated according to elements of visual content 
at different layers of semantic significance. Unlike previous approaches, videos can 
include several different sports and can also be interleaved with non sport shots. Each 
shot is decomposed into its visual and graphic content elements, including foreground and 
background, objects, text captions, etc. Several differe ... 

Keywords: content-based video retrieval, sports videos, video annotation, video 
semantics 
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[ppt] The JISC Information Environment 

File Format: Microsoft Powerpoint 97 - View as HTML 

finding stuff across multiple content providers, access, streamlining access to appropriate 
copy, content providers expose metadata about their content for ... 
www.ukoln.ac.uk/events/bath-profile/ presentations/a-powell.ppt - Similar pages 

[ppt] Bild 1 

File Format: Microsoft Powerpoint - View as HTML 

Endnote. Reference Manager. URN:NBN Register Format. Export Formats. DiVA 
Document Format ... Metadata & Content. Repository. Long-term storage packages ... 
www.erpanet.org/events/2004/ cork/presentations/PID_muller.ppt - Similar pages 

[PDF] Bild 1 

File Format: PDF/Adobe Acrobat - View as HTML 

metadata services and an archival copy, within the same framework. • Possibility to use the 
same PID for different, manifestations of the same content ... 
www.erpanet.org/events/2004/ cork/presentations/PID_muller.pdf - Similar pages 

darcusblog 

Some people at USQ use EndNote, but unless it gets OpenDocument support it ... use 
something like LiveClipboard to copy and paste metadata-enhanced content ... 
netapps.muohio.edu/blogs/darcusb/darcusb/page/2/ - 46k - Cached - Similar pages 

HubLoa: The State of Biomedical PDFs 

Finally there's the placement of the DOI, which should really be in the metadata but needs 
at least to be in the PDF text; Nature's bizarre copy protection; ... 
hublog.hubmed.org/archives/001 306.html - 23k - Cached - Similar pages 

Appendix IV. Digital Library Content and Course Management Systems ... 
IMPORT = into tool or managed environment, bring or point to content itself, or metadata 
about content . SAVE = prior to publishing, make a copy for the ... 
www.diglib.org/pubs/cmsdl0407/cmsdl0407app4-0.htm - 55k - Cached - Similar pages 

[pdf] Appendix 4.0: Use Case Working Group: Report and Recommendations 
File Format: PDF/Adobe Acrobat - View as HTML 

such applications as Powerpoint, Endnote, Adobe Acrobat, and weblogs. ... metadata 
about content . SAVE = prior to publishing, make a copy for the desktop, ... 
www.diglib.org/pubs/cmsdl0407/cmsdl0407app4-0.pdf - Similar pages 

Catalogablog 

AIIM International announces XML standards initiative for the exchange of document 
images and related metadata. AIIM International, the Enterprise Content ... 
catalogablog.blogspot.com/ 2003_04_20_catalogablog_archive.html - 48k - 
Cached - Similar pages 

[doc] Joint Information Systems Committee (JISC) 
File Format: Microsoft Word - View as HTML 

This can be metadata about content resources or associated services. ... by content 
creation software (Word, Acrobat, Dreamweaver, CityDesk, Endnote, ... 
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www.jisc.ac.uk/uploaded_documents/ Metadata%20Generation%20for%20Resource% 
20Discovery%20ITT%20FINAL.doc - Similar pages 

What's New 

DocuRights will allow publishers to disseminate PDF (Portable Document Format) file 

content on the Web, without losing copy protection control of the ... 

wwwl .kfinder.com/newweb/WhatsNew/whatsnew.html - 20k - Cached - Similar pages 
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